Proceedings of the ECML / PKDD – 2003

نویسندگان

  • Bettina Berendt
  • Andreas Hotho
  • Dunja Mladenic
  • Maarten van Someren
  • Myra Spiliopoulou
  • Gerd Stumme
  • Luis Torgo
  • Marko Grobelnik
  • John R. Punin
چکیده

This paper presents a novel method for extracting information from collections of Web pages across different sites. Our method uses a standard wrapper induction algorithm and exploits named entity information. We introduce the idea of post-processing the extraction results for resolving ambiguous facts and improve the overall extraction performance. Postprocessing involves the exploitation of two additional sources of information: fact transition probabilities, based on a trained bigram model, and confidence probabilities, estimated for each fact by the wrapper induction system. A multiplicative model that is based on the product of those two probabilities is also considered for post-processing. Experiments were conducted on pages describing laptop products, collected from many different sites and in four different languages. The results highlight the effectiveness of our approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

’ introduction : special issue of the ECML /

This special issue is a collection of papers submitted to the ECML/PKDD 2013 and 2014 journal tracks and accepted for publication in “Machine Learning”. TheEuropeanConference onMachineLearning andPrinciples andPractice ofKnowledge Discovery in Databases, ECML/PKDD, launched its journal track in 2013. In order to cover the full scope of the conference, which is a merger of the formerly independe...

متن کامل

Reliability Maps: A Tool to Enhance Probability Estimates and Improve Classification Accuracy

Probability Estimates and Improve Classification Accuracy (Best paper award). In T. Calders, F. Esposito, E. Hullermeier, & R. Meo (Eds.), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part II (pp. 18-33). (Lecture Notes in Artificial Intelligence; Vol. 8725). Springer Berlin Heidelberg, 2010. DOI: ...

متن کامل

Machine Learning: ECML 2005, 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings

It sounds good when knowing the machine learning ecml 2005 16th european conference on machine learning porto portugal october 3 7 2005 proceedings lecture notes in computer in this website. This is one of the books that many people looking for. In the past, many people ask about this book as their favourite book to read and collect. And now, we present hat you need quickly. It seems to be so h...

متن کامل

MADSPAM Consortium at the ECML/PKDD Discovery

We present here the contribution of the MADSPAM consortium to the ECML/PKDD Discovery Challenge 2010. The submitted method is based on both a RankBoost algorithm and on propagation techniques.

متن کامل

Multi-Plant Photovoltaic Energy Forecasting Challenge: Second Place Solution

This paper presents the approach we took to solve the MultiPlant Photovoltaic Energy Forecasting Challenge for ECML/PKDD 2017. The approach we took granted us the second place of that challenge. In the paper, we will present how we moved from standard regression techniques to simple function optimization to tackle the challenge.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003